Goto

Collaborating Authors

 general theory


A General Theory of Correct, Incorrect, and Extrinsic Equivariance

Neural Information Processing Systems

Although equivariant machine learning has proven effective at many tasks, success depends heavily on the assumption that the ground truth function is symmetric over the entire domain matching the symmetry in an equivariant neural network. A missing piece in the equivariant learning literature is the analysis of equivariant networks when symmetry exists only partially in the domain. In this work, we present a general theory for such a situation. We propose pointwise definitions of correct, incorrect, and extrinsic equivariance, which allow us to quantify continuously the degree of each type of equivariance a function displays. We then study the impact of various degrees of incorrect or extrinsic symmetry on model error. We prove error lower bounds for invariant or equivariant networks in classification or regression settings with partially incorrect symmetry. We also analyze the potentially harmful effects of extrinsic equivariance.


A General Theory of Equivariant CNNs on Homogeneous Spaces

Neural Information Processing Systems

We present a general theory of Group equivariant Convolutional Neural Networks (G-CNNs) on homogeneous spaces such as Euclidean space and the sphere. Feature maps in these networks represent fields on a homogeneous base space, and layers are equivariant maps between spaces of fields. The theory enables a systematic classification of all existing G-CNNs in terms of their symmetry group, base space, and field type. We also answer a fundamental question: what is the most general kind of equivariant linear map between feature spaces (fields) of given types? We show that such maps correspond one-to-one with generalized convolutions with an equivariant kernel, and characterize the space of such kernels.


REWA: A General Theory of Witness-Based Similarity

Phadke, Nikit

arXiv.org Artificial Intelligence

We present a universal framework for similarity-preserving encodings that subsumes all discrete, continuous, algebraic, and learned similarity methods under a single theoretical umbrella. By formulating similarity as functional witness projection over monoids, we prove that \[ O\!\left(\frac{1}{Δ^{2}}\log N\right) \] encoding complexity with ranking preservation holds for arbitrary algebraic structures. This unification reveals that Bloom filters, Locality Sensitive Hashing (LSH), Count-Min sketches, Random Fourier Features, and Transformer attention kernels are instances of the same underlying mechanism. We provide complete proofs with explicit constants under 4-wise independent hashing, handle heavy-tailed witnesses via normalization and clipping, and prove \[ O(\log N) \] complexity for all major similarity methods from 1970-2024. We give explicit constructions for Boolean, Natural, Real, Tropical, and Product monoids, prove tight concentration bounds, and demonstrate compositional properties enabling multi-primitive similarity systems.


Accelerated Stochastic Matrix Inversion: General Theory and Speeding up BFGS Rules for Faster Second-Order Optimization

Neural Information Processing Systems

We present the first accelerated randomized algorithm for solving linear systems in Euclidean spaces. One essential problem of this type is the matrix inversion problem. In particular, our algorithm can be specialized to invert positive definite matrices in such a way that all iterates (approximate solutions) generated by the algorithm are positive definite matrices themselves. This opens the way for many applications in the field of optimization and machine learning. As an application of our general theory, we develop the first accelerated (deterministic and stochastic) quasi-Newton updates. Our updates lead to provably more aggressive approximations of the inverse Hessian, and lead to speed-ups over classical non-accelerated rules in numerical experiments. Experiments with empirical risk minimization show that our rules can accelerate training of machine learning models.


A General Theory of Equivariant CNNs on Homogeneous Spaces

Neural Information Processing Systems

We present a general theory of Group equivariant Convolutional Neural Networks (G-CNNs) on homogeneous spaces such as Euclidean space and the sphere. Feature maps in these networks represent fields on a homogeneous base space, and layers are equivariant maps between spaces of fields. The theory enables a systematic classification of all existing G-CNNs in terms of their symmetry group, base space, and field type. We also answer a fundamental question: what is the most general kind of equivariant linear map between feature spaces (fields) of given types? We show that such maps correspond one-to-one with generalized convolutions with an equivariant kernel, and characterize the space of such kernels.


Reviews: A General Theory of Equivariant CNNs on Homogeneous Spaces

Neural Information Processing Systems

I would give an accept score if I were able to have a look at the new version and be happy with it (as is possible in openreview settings for example). However since improving the presentation usually takes a lot of work and it is not possible for me to verify in which way the improvements have actually been implemented, I will bump it to a 5. I do think readability and clarity is key for impact as written in my review, which is the main reason I gave a much lower score than other reviewers, some of whom have worked on exactly this intersection of algebra and G-CNNs themselves and provided valuable feedback on the content from an expert's perspective. The following comments are based on the reviewer's personal definition of clarity and good quality of presentation: that most of the times when following the paper from start to end it is clear to the reader why each paragraph is written and how it links to the objective of the main results of the paper, here claimed e.g. in the last sentence to be the development of new equivariant network architectures. The paper is one long lead-up of three pages of definitions of mathematical terms and symbols to the theorems in section 6 on equivariant kernels which represent the core results of the paper. In general, I appreciate rigorous frameworks which generalize existing methods, especially if they provide insight and enable the design of an arbitrary new instance that fits in the framework (in this case transformations on arbitrary fields).


A General Theory of Correct, Incorrect, and Extrinsic Equivariance

Neural Information Processing Systems

Although equivariant machine learning has proven effective at many tasks, success depends heavily on the assumption that the ground truth function is symmetric over the entire domain matching the symmetry in an equivariant neural network. A missing piece in the equivariant learning literature is the analysis of equivariant networks when symmetry exists only partially in the domain. In this work, we present a general theory for such a situation. We propose pointwise definitions of correct, incorrect, and extrinsic equivariance, which allow us to quantify continuously the degree of each type of equivariance a function displays. We then study the impact of various degrees of incorrect or extrinsic symmetry on model error.


A General Theory of Equivariant CNNs on Homogeneous Spaces

Neural Information Processing Systems

We present a general theory of Group equivariant Convolutional Neural Networks (G-CNNs) on homogeneous spaces such as Euclidean space and the sphere. Feature maps in these networks represent fields on a homogeneous base space, and layers are equivariant maps between spaces of fields. The theory enables a systematic classification of all existing G-CNNs in terms of their symmetry group, base space, and field type. We also answer a fundamental question: what is the most general kind of equivariant linear map between feature spaces (fields) of given types? We show that such maps correspond one-to-one with generalized convolutions with an equivariant kernel, and characterize the space of such kernels.


Accelerated Stochastic Matrix Inversion: General Theory and Speeding up BFGS Rules for Faster Second-Order Optimization

Neural Information Processing Systems

We present the first accelerated randomized algorithm for solving linear systems in Euclidean spaces. One essential problem of this type is the matrix inversion problem. In particular, our algorithm can be specialized to invert positive definite matrices in such a way that all iterates (approximate solutions) generated by the algorithm are positive definite matrices themselves. This opens the way for many applications in the field of optimization and machine learning. As an application of our general theory, we develop the first accelerated (deterministic and stochastic) quasi-Newton updates.


Reviews: Accelerated Stochastic Matrix Inversion: General Theory and Speeding up BFGS Rules for Faster Second-Order Optimization

Neural Information Processing Systems

This paper presents an accelerated version of the sketch-and-project algorithm, an accelerated algorithm for matrix inversion and accelerated variants of deterministic and stochastic quasi-Newton updates. I strongly believe that this line of research is of interest for the ML and optimization communities, and that the algorithms and theoretical results presented in this paper are significant and novel. Moreover, the numerical results presented in the paper clearly illustrate the effectiveness of the approaches presented in the paper. For this reason, I strongly recommend this paper for publication. Below I list my minor concerns with the paper.